Suicide is a critical issue that impacts individuals and communities worldwide. In this study, By analyzing a diverse dataset comprising demographic and economic information, we aim to gain valuable insights into suicide patterns. Through understanding the underlying factors and risks, our objective is to contribute towards effective prevention strategies that promote mental well-being and foster a compassionate society. Together, we can empower communities to address this critical issue and work towards saving lives and nurturing supportive environments.
| Feature Name | Description | |
|---|---|---|
country |
The name of the country where the data is recorded. | |
year |
The year in which the data is recorded. | |
sex |
The gender (male or female) for which the data is reported. | |
age |
The age group to which the data corresponds. | |
suicides_no |
The number of suicides reported for a specific group. | |
population |
The population count for a specific group. | |
suicides/100k pop |
The number of suicides per 100,000 population. | |
HDI for year |
The Human Development Index value for a specific year. | |
gdp for year |
The gross domestic product (GDP) for a specific year. | |
gdp per captia |
The GDP per capita, representing the economic output per person. | |
generation |
The generational group to which individuals belong. |
import pandas as pd
import numpy as np
import plotly.express as px
from plotly.subplots import make_subplots
import plotly.graph_objects as go
import seaborn as sns
%matplotlib inline
sns.set(rc={'figure.figsize': [11, 4]}, font_scale=0.7)
df = pd.read_csv('master.csv')
df.sample(10)
| country | year | sex | age | suicides_no | population | suicides/100k pop | country-year | HDI for year | gdp_for_year ($) | gdp_per_capita ($) | generation | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 24581 | Sweden | 2004 | female | 25-34 years | 32 | 570904 | 5.61 | Sweden2004 | NaN | 381,705,425,302 | 44831 | Generation X |
| 12743 | Israel | 2012 | male | 25-34 years | 57 | 568580 | 10.02 | Israel2012 | 0.890 | 257,296,579,579 | 36263 | Millenials |
| 8930 | Finland | 2003 | male | 35-54 years | 376 | 783442 | 47.99 | Finland2003 | NaN | 171,071,106,095 | 34701 | Boomers |
| 7242 | Czech Republic | 2002 | male | 55-74 years | 282 | 921139 | 30.61 | Czech Republic2002 | NaN | 81,910,771,994 | 8399 | Silent |
| 7654 | Denmark | 2013 | female | 25-34 years | 16 | 322819 | 4.96 | Denmark2013 | 0.923 | 343,584,385,594 | 64831 | Millenials |
| 5581 | Chile | 2011 | female | 35-54 years | 158 | 2367343 | 6.67 | Chile2011 | 0.821 | 252,251,992,029 | 15854 | Generation X |
| 25597 | Trinidad and Tobago | 2008 | female | 55-74 years | 4 | 95285 | 4.20 | Trinidad and Tobago2008 | NaN | 27,870,257,894 | 22857 | Silent |
| 22022 | Serbia | 2002 | male | 15-24 years | 69 | 516180 | 13.37 | Serbia2002 | NaN | 16,116,843,146 | 2258 | Millenials |
| 21357 | Saint Lucia | 1991 | male | 25-34 years | 2 | 10359 | 19.31 | Saint Lucia1991 | NaN | 513,753,818 | 4194 | Boomers |
| 27441 | Uruguay | 2005 | female | 75+ years | 18 | 131946 | 13.64 | Uruguay2005 | 0.756 | 17,362,857,684 | 5655 | Silent |
df.shape
(27820, 12)
df.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 27820 entries, 0 to 27819 Data columns (total 12 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 country 27820 non-null object 1 year 27820 non-null int64 2 sex 27820 non-null object 3 age 27820 non-null object 4 suicides_no 27820 non-null int64 5 population 27820 non-null int64 6 suicides/100k pop 27820 non-null float64 7 country-year 27820 non-null object 8 HDI for year 8364 non-null float64 9 gdp_for_year ($) 27820 non-null object 10 gdp_per_capita ($) 27820 non-null int64 11 generation 27820 non-null object dtypes: float64(2), int64(4), object(6) memory usage: 2.5+ MB
df.describe()
| year | suicides_no | population | suicides/100k pop | HDI for year | gdp_per_capita ($) | |
|---|---|---|---|---|---|---|
| count | 27820.000000 | 27820.000000 | 2.782000e+04 | 27820.000000 | 8364.000000 | 27820.000000 |
| mean | 2001.258375 | 242.574407 | 1.844794e+06 | 12.816097 | 0.776601 | 16866.464414 |
| std | 8.469055 | 902.047917 | 3.911779e+06 | 18.961511 | 0.093367 | 18887.576472 |
| min | 1985.000000 | 0.000000 | 2.780000e+02 | 0.000000 | 0.483000 | 251.000000 |
| 25% | 1995.000000 | 3.000000 | 9.749850e+04 | 0.920000 | 0.713000 | 3447.000000 |
| 50% | 2002.000000 | 25.000000 | 4.301500e+05 | 5.990000 | 0.779000 | 9372.000000 |
| 75% | 2008.000000 | 131.000000 | 1.486143e+06 | 16.620000 | 0.855000 | 24874.000000 |
| max | 2016.000000 | 22338.000000 | 4.380521e+07 | 224.970000 | 0.944000 | 126352.000000 |
df.describe(include='O')
| country | sex | age | country-year | gdp_for_year ($) | generation | |
|---|---|---|---|---|---|---|
| count | 27820 | 27820 | 27820 | 27820 | 27820 | 27820 |
| unique | 101 | 2 | 6 | 2321 | 2321 | 6 |
| top | Mauritius | male | 15-24 years | Albania1987 | 2,156,624,900 | Generation X |
| freq | 382 | 13910 | 4642 | 12 | 12 | 6408 |
# I am dropping the country-year column since it is the combination of country and year columns
df.drop('country-year', axis=1, inplace=True)
df.head()
| country | year | sex | age | suicides_no | population | suicides/100k pop | HDI for year | gdp_for_year ($) | gdp_per_capita ($) | generation | |
|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | Albania | 1987 | male | 15-24 years | 21 | 312900 | 6.71 | NaN | 2,156,624,900 | 796 | Generation X |
| 1 | Albania | 1987 | male | 35-54 years | 16 | 308000 | 5.19 | NaN | 2,156,624,900 | 796 | Silent |
| 2 | Albania | 1987 | female | 15-24 years | 14 | 289700 | 4.83 | NaN | 2,156,624,900 | 796 | Generation X |
| 3 | Albania | 1987 | male | 75+ years | 1 | 21800 | 4.59 | NaN | 2,156,624,900 | 796 | G.I. Generation |
| 4 | Albania | 1987 | male | 25-34 years | 9 | 274300 | 3.28 | NaN | 2,156,624,900 | 796 | Boomers |
# Changing columns names
df.columns = df.columns.str.strip().str.replace(' ','_')
df.columns
Index(['country', 'year', 'sex', 'age', 'suicides_no', 'population',
'suicides/100k_pop', 'HDI_for_year', 'gdp_for_year_($)',
'gdp_per_capita_($)', 'generation'],
dtype='object')
# checking for null values
df.isna().sum()
country 0 year 0 sex 0 age 0 suicides_no 0 population 0 suicides/100k_pop 0 HDI_for_year 19456 gdp_for_year_($) 0 gdp_per_capita_($) 0 generation 0 dtype: int64
df.duplicated().sum()
0
df.nunique()
country 101 year 32 sex 2 age 6 suicides_no 2084 population 25564 suicides/100k_pop 5298 HDI_for_year 305 gdp_for_year_($) 2321 gdp_per_capita_($) 2233 generation 6 dtype: int64
df['country'].unique()
array(['Albania', 'Antigua and Barbuda', 'Argentina', 'Armenia', 'Aruba',
'Australia', 'Austria', 'Azerbaijan', 'Bahamas', 'Bahrain',
'Barbados', 'Belarus', 'Belgium', 'Belize',
'Bosnia and Herzegovina', 'Brazil', 'Bulgaria', 'Cabo Verde',
'Canada', 'Chile', 'Colombia', 'Costa Rica', 'Croatia', 'Cuba',
'Cyprus', 'Czech Republic', 'Denmark', 'Dominica', 'Ecuador',
'El Salvador', 'Estonia', 'Fiji', 'Finland', 'France', 'Georgia',
'Germany', 'Greece', 'Grenada', 'Guatemala', 'Guyana', 'Hungary',
'Iceland', 'Ireland', 'Israel', 'Italy', 'Jamaica', 'Japan',
'Kazakhstan', 'Kiribati', 'Kuwait', 'Kyrgyzstan', 'Latvia',
'Lithuania', 'Luxembourg', 'Macau', 'Maldives', 'Malta',
'Mauritius', 'Mexico', 'Mongolia', 'Montenegro', 'Netherlands',
'New Zealand', 'Nicaragua', 'Norway', 'Oman', 'Panama', 'Paraguay',
'Philippines', 'Poland', 'Portugal', 'Puerto Rico', 'Qatar',
'Republic of Korea', 'Romania', 'Russian Federation',
'Saint Kitts and Nevis', 'Saint Lucia',
'Saint Vincent and Grenadines', 'San Marino', 'Serbia',
'Seychelles', 'Singapore', 'Slovakia', 'Slovenia', 'South Africa',
'Spain', 'Sri Lanka', 'Suriname', 'Sweden', 'Switzerland',
'Thailand', 'Trinidad and Tobago', 'Turkey', 'Turkmenistan',
'Ukraine', 'United Arab Emirates', 'United Kingdom',
'United States', 'Uruguay', 'Uzbekistan'], dtype=object)
px.histogram(df['country'])
df['year'].unique()
array([1987, 1988, 1989, 1992, 1993, 1994, 1995, 1996, 1997, 1998, 1999,
2000, 2001, 2002, 2003, 2004, 2005, 2006, 2007, 2008, 2009, 2010,
1985, 1986, 1990, 1991, 2012, 2013, 2014, 2015, 2011, 2016],
dtype=int64)
sns.countplot(data=df,x=df['year'])
<AxesSubplot: xlabel='year', ylabel='count'>
df['sex'].unique()
array(['male', 'female'], dtype=object)
sns.countplot(data=df,x=df['sex'])
<AxesSubplot: xlabel='sex', ylabel='count'>
df['age'].unique()
array(['15-24 years', '35-54 years', '75+ years', '25-34 years',
'55-74 years', '5-14 years'], dtype=object)
sns.countplot(data=df,x=df['age'])
<AxesSubplot: xlabel='age', ylabel='count'>
df['suicides_no'].unique()
array([ 21, 16, 14, ..., 5503, 4359, 2872], dtype=int64)
sns.boxplot(data=df, x=df['suicides_no'])
<AxesSubplot: xlabel='suicides_no'>
df['population'].unique()
array([ 312900, 308000, 289700, ..., 2762158, 2631600, 1438935],
dtype=int64)
sns.boxplot(x=df['population'])
<AxesSubplot: xlabel='population'>
df['suicides/100k_pop'].unique()
array([ 6.71, 5.19, 4.83, ..., 47.86, 40.75, 26.61])
sns.boxplot(x=df['suicides/100k_pop'])
<AxesSubplot: xlabel='suicides/100k_pop'>
df['HDI_for_year'].unique()
array([ nan, 0.619, 0.656, 0.695, 0.722, 0.781, 0.783, 0.694, 0.705,
0.731, 0.762, 0.775, 0.811, 0.818, 0.831, 0.833, 0.836, 0.632,
0.605, 0.648, 0.721, 0.723, 0.728, 0.733, 0.865, 0.882, 0.898,
0.927, 0.93 , 0.932, 0.933, 0.935, 0.764, 0.794, 0.815, 0.853,
0.879, 0.881, 0.884, 0.885, 0.609, 0.64 , 0.778, 0.78 , 0.774,
0.786, 0.727, 0.816, 0.819, 0.817, 0.821, 0.824, 0.7 , 0.716,
0.753, 0.765, 0.793, 0.785, 0.683, 0.796, 0.798, 0.806, 0.851,
0.874, 0.866, 0.883, 0.886, 0.889, 0.888, 0.89 , 0.644, 0.664,
0.701, 0.71 , 0.711, 0.715, 0.724, 0.576, 0.608, 0.702, 0.737,
0.742, 0.746, 0.752, 0.755, 0.686, 0.696, 0.713, 0.749, 0.773,
0.779, 0.782, 0.827, 0.849, 0.861, 0.867, 0.892, 0.903, 0.909,
0.91 , 0.912, 0.654, 0.699, 0.788, 0.814, 0.83 , 0.832, 0.573,
0.596, 0.629, 0.679, 0.706, 0.718, 0.72 , 0.623, 0.652, 0.682,
0.704, 0.75 , 0.756, 0.761, 0.766, 0.807, 0.653, 0.685, 0.73 ,
0.776, 0.772, 0.768, 0.769, 0.8 , 0.848, 0.852, 0.85 , 0.847,
0.863, 0.868, 0.87 , 0.862, 0.902, 0.908, 0.92 , 0.921, 0.923,
0.631, 0.645, 0.665, 0.674, 0.698, 0.717, 0.732, 0.522, 0.566,
0.603, 0.638, 0.658, 0.662, 0.666, 0.719, 0.838, 0.855, 0.859,
0.857, 0.869, 0.878, 0.741, 0.825, 0.887, 0.672, 0.735, 0.74 ,
0.747, 0.754, 0.801, 0.906, 0.911, 0.915, 0.916, 0.759, 0.799,
0.864, 0.739, 0.483, 0.513, 0.552, 0.611, 0.617, 0.624, 0.626,
0.627, 0.542, 0.581, 0.618, 0.63 , 0.634, 0.802, 0.823, 0.828,
0.826, 0.896, 0.897, 0.899, 0.77 , 0.803, 0.895, 0.893, 0.894,
0.738, 0.829, 0.856, 0.873, 0.872, 0.65 , 0.671, 0.729, 0.791,
0.891, 0.69 , 0.804, 0.795, 0.809, 0.812, 0.615, 0.562, 0.593,
0.614, 0.639, 0.655, 0.67 , 0.813, 0.837, 0.839, 0.805, 0.88 ,
0.822, 0.575, 0.647, 0.777, 0.748, 0.877, 0.919, 0.922, 0.82 ,
0.905, 0.907, 0.625, 0.628, 0.917, 0.931, 0.94 , 0.941, 0.942,
0.944, 0.714, 0.564, 0.579, 0.604, 0.646, 0.668, 0.669, 0.677,
0.84 , 0.843, 0.676, 0.844, 0.841, 0.703, 0.751, 0.691, 0.697,
0.757, 0.771, 0.736, 0.743, 0.767, 0.763, 0.876, 0.613, 0.643,
0.651, 0.659, 0.663, 0.725, 0.845, 0.597, 0.692, 0.707, 0.709,
0.901, 0.904, 0.846, 0.924, 0.925, 0.928, 0.539, 0.572, 0.684,
0.726, 0.673, 0.688, 0.913, 0.667, 0.79 , 0.594, 0.661, 0.675])
df['gdp_for_year_($)'].unique()
array(['2,156,624,900', '2,126,000,000', '2,335,124,988', ...,
'51,821,573,338', '57,690,453,461', '63,067,077,179'], dtype=object)
df['gdp_per_capita_($)'].unique()
array([ 796, 769, 833, ..., 1964, 2150, 2309], dtype=int64)
sns.histplot(data=df, x='gdp_per_capita_($)')
<AxesSubplot: xlabel='gdp_per_capita_($)', ylabel='Count'>
df['generation'].unique()
array(['Generation X', 'Silent', 'G.I. Generation', 'Boomers',
'Millenials', 'Generation Z'], dtype=object)
sns.catplot(data=df, x="generation", kind="count", palette="ch:.25")
<seaborn.axisgrid.FacetGrid at 0x224f19eadf0>
Observation of Features¶After examining each feature in the dataset, it was found that there are outliers present in every numeric column. However, these outliers are not unusual or incorrect data points, but rather align with the overall trends and patterns observed in the dataset.
df.isna().sum() / df.shape[0]*100
country 0.000000 year 0.000000 sex 0.000000 age 0.000000 suicides_no 0.000000 population 0.000000 suicides/100k_pop 0.000000 HDI_for_year 69.935298 gdp_for_year_($) 0.000000 gdp_per_capita_($) 0.000000 generation 0.000000 dtype: float64
Dropping 'HDI_for_year' column due to significant missing data of approximately 70%.
df.drop('HDI_for_year', axis=1, inplace=True)
df.isna().sum()
country 0 year 0 sex 0 age 0 suicides_no 0 population 0 suicides/100k_pop 0 gdp_for_year_($) 0 gdp_per_capita_($) 0 generation 0 dtype: int64
| Column Name | Description | ||
|---|---|---|---|
suicides_rate |
Calculated by dividing the 'suicides/100k_pop' column by 100, effectively converting it into a percentage format representing the suicide rate per 100,000 population. | ||
gdp_total |
Total GDP for each country-year. Obtained by multiplying 'gdp_per_capita_($)' by 'population'. Represents the economic output. | ||
suicides_per_gdp |
Ratio of 'suicides_no' to 'gdp_total'. Examines the relationship between suicides and the economic output of a country. |
# Calculate 'suicides_rate'
df['suicides_rate'] = (df['suicides/100k_pop'] / 100)
# Calculate 'gdp_total'
df['gdp_total'] = df['gdp_per_capita_($)'] * df['population']
# Calculate 'suicides_per_gdp'
df['suicides_per_gdp'] = df['suicides_no'] / df['gdp_total']
df['gdp_for_year_($)'] = df['gdp_for_year_($)'].apply(lambda x: int(x.replace(',', '')))
df['gdp_for_year_($)'].unique()
array([ 2156624900, 2126000000, 2335124988, ..., 51821573338,
57690453461, 63067077179], dtype=int64)
sns.boxplot(x=df['gdp_for_year_($)'])
<AxesSubplot: xlabel='gdp_for_year_($)'>
df.columns
Index(['country', 'year', 'sex', 'age', 'suicides_no', 'population',
'suicides/100k_pop', 'gdp_for_year_($)', 'gdp_per_capita_($)',
'generation', 'suicides_rate', 'gdp_total', 'suicides_per_gdp'],
dtype='object')
df.head()
| country | year | sex | age | suicides_no | population | suicides/100k_pop | gdp_for_year_($) | gdp_per_capita_($) | generation | suicides_rate | gdp_total | suicides_per_gdp | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | Albania | 1987 | male | 15-24 years | 21 | 312900 | 6.71 | 2156624900 | 796 | Generation X | 0.0671 | 249068400 | 8.431419e-08 |
| 1 | Albania | 1987 | male | 35-54 years | 16 | 308000 | 5.19 | 2156624900 | 796 | Silent | 0.0519 | 245168000 | 6.526137e-08 |
| 2 | Albania | 1987 | female | 15-24 years | 14 | 289700 | 4.83 | 2156624900 | 796 | Generation X | 0.0483 | 230601200 | 6.071087e-08 |
| 3 | Albania | 1987 | male | 75+ years | 1 | 21800 | 4.59 | 2156624900 | 796 | G.I. Generation | 0.0459 | 17352800 | 5.762759e-08 |
| 4 | Albania | 1987 | male | 25-34 years | 9 | 274300 | 3.28 | 2156624900 | 796 | Boomers | 0.0328 | 218342800 | 4.121959e-08 |
df.describe()
| year | suicides_no | population | suicides/100k_pop | gdp_for_year_($) | gdp_per_capita_($) | suicides_rate | gdp_total | suicides_per_gdp | |
|---|---|---|---|---|---|---|---|---|---|
| count | 27820.000000 | 27820.000000 | 2.782000e+04 | 27820.000000 | 2.782000e+04 | 27820.000000 | 27820.000000 | 2.782000e+04 | 2.782000e+04 |
| mean | 2001.258375 | 242.574407 | 1.844794e+06 | 12.816097 | 4.455810e+11 | 16866.464414 | 0.128161 | 3.713721e+10 | 3.208183e-08 |
| std | 8.469055 | 902.047917 | 3.911779e+06 | 18.961511 | 1.453610e+12 | 18887.576472 | 0.189615 | 1.333012e+11 | 9.682094e-08 |
| min | 1985.000000 | 0.000000 | 2.780000e+02 | 0.000000 | 4.691962e+07 | 251.000000 | 0.000000 | 2.131500e+05 | 0.000000e+00 |
| 25% | 1995.000000 | 3.000000 | 9.749850e+04 | 0.920000 | 8.985353e+09 | 3447.000000 | 0.009200 | 5.641520e+08 | 8.335782e-10 |
| 50% | 2002.000000 | 25.000000 | 4.301500e+05 | 5.990000 | 4.811469e+10 | 9372.000000 | 0.059900 | 3.390442e+09 | 5.143449e-09 |
| 75% | 2008.000000 | 131.000000 | 1.486143e+06 | 16.620000 | 2.602024e+11 | 24874.000000 | 0.166200 | 1.999500e+10 | 2.154262e-08 |
| max | 2016.000000 | 22338.000000 | 4.380521e+07 | 224.970000 | 1.812071e+13 | 126352.000000 | 2.249700 | 2.515602e+12 | 2.905276e-06 |
suicide_age_data = df.groupby(['age', 'sex']).agg({
'suicides_rate':'mean'
}).reset_index().sort_values(by='suicides_rate')
fig = px.bar(suicide_age_data, x='age', y='suicides_rate', color='sex',
barmode='group', title='Suicide Rates by Age Category and Gender')
fig.show()
The visualization shows that older individuals (age 75+) have a higher suicide rate than adults and young people. This suggests that as people get older, they may face more difficulties and feel more desperate. Additionally, males have a higher suicide rate than females in all age groups, indicating a potential difference in mental health challenges between genders. These findings highlight the importance of providing specific support for older individuals and addressing mental health issues that affect different genders.
data_GDP_suicide_rate = df.groupby(["country", "year"]).agg({
"gdp_per_capita_($)": "mean",
"suicides_rate": "mean"
}).reset_index()
fig = px.scatter(data_GDP_suicide_rate, x='gdp_per_capita_($)', y="suicides_rate", color='year', trendline="ols")
fig.update_layout(title="GDP per Capita vs. Suicide Rate (Over Time)",
xaxis_title="GDP per Capita ($)",
yaxis_title="Suicide Rate per 100k Population")
The visualization shows a negative correlation between GDP per Capita and suicide rate, indicating that higher GDP per Capita is associated with lower suicide rates. The graph also shows an upward trend in GDP per Capita over time, coinciding with a decrease in suicide rates. These findings emphasize the role of economic development in reducing suicide rates and promoting mental well-being. Policymakers and organizations should prioritize improving living standards to contribute to suicide prevention and population mental health.
data_GDP_suicide_rate = df.groupby(["country", "year"]).agg({
"gdp_for_year_($)": "mean",
"suicides_rate": "mean"
}).reset_index()
fig = px.scatter(data_GDP_suicide_rate, x='gdp_for_year_($)', y="suicides_rate", color='year', trendline="ols")
fig.update_layout(title="GDP for Year vs. Suicide Rate (Over Time)",
xaxis_title="GDP for Year ($)",
yaxis_title="Suicide Rate per 100k Population")
The visualization reveals a negative relationship between a country's GDP for a specific year and its suicide rate. Countries with lower GDPs tend to have higher suicide rates, while those with higher GDPs exhibit lower rates. Furthermore, the graph highlights certain countries that show improvement in their suicide rates over time.
# Create subplots with 3 rows and 1 column
fig = make_subplots(rows=3, cols=1,
subplot_titles=("By Age Group", "By Country (Top 10)", "By Sex"), vertical_spacing=0.15)
#Age graph
data_age = df.groupby(["year", "age"])["suicides_rate"].mean().reset_index()
fig_age = px.line(data_age, x="year", y="suicides_rate", color="age", title="By Age Group")
for trace in fig_age["data"]:
fig.add_trace(trace, row=1, col=1)
#Country graph
data_country = df.groupby(["year", "country"])["suicides_rate"].mean().reset_index()
top_10_countries = data_country.groupby("country")["suicides_rate"].mean().nlargest(10).index
data_country_top10 = data_country[data_country["country"].isin(top_10_countries)]
fig_country = px.line(data_country_top10, x="year", y="suicides_rate", color="country",
title="By Country (Top 10)")
for trace in fig_country["data"]:
fig.add_trace(trace, row=2, col=1)
#Sex graph
data_sex = df.groupby(["year", "sex"])["suicides_rate"].mean().reset_index()
fig_sex = px.line(data_sex, x="year", y="suicides_rate", color="sex", title="By Sex")
for trace in fig_sex["data"]:
fig.add_trace(trace, row=3, col=1)
fig.update_layout(height=800, showlegend=True)
fig.update_xaxes(title_text="Year", row=1, col=1)
fig.update_xaxes(title_text="Year", row=2, col=1)
fig.update_xaxes(title_text="Year", row=3, col=1)
fig.update_yaxes(title_text="Suicide Rate", row=1, col=1)
fig.update_yaxes(title_text="Suicide Rate", row=2, col=1)
fig.update_yaxes(title_text="Suicide Rate", row=3, col=1)
fig.show()
Age Graph: The graph shows that suicide rates increase with age, indicating that older individuals face more challenges and desperation. However, the suicide rates across different age groups remain relatively stable over the years.
Country Graph: The graph highlights the top 10 countries with the highest suicide rates. There is an upward trend from 1990, potentially influenced by factors like war, but a subsequent decline after 2005 suggests efforts to address these issues.
Sex Graph: The graph reveals a significant disparity in suicide rates between genders, with men exhibiting higher rates. Factors such as depression and societal pressures contribute to this discrepancy, emphasizing the need for targeted mental health support for men.
data_population_suicide_rate = df.groupby(["country", "year"]).agg(
population=("population", "mean"),
suicides_rate=("suicides_rate", "mean")
).reset_index()
top_20_countries = data_population_suicide_rate.groupby("country")["population"].mean().nlargest(20).index
data_population_suicide_rate_top20 = data_population_suicide_rate[data_population_suicide_rate["country"].isin(top_20_countries)]
fig = px.scatter(data_population_suicide_rate_top20, x="population", y="suicides_rate", color='country',
labels={"population": "Population Size", "suicides_rate": "Suicide Rate per 100k Population"},
title="Population Size vs. Suicide Rate (Top 20 Countries)")
fig.show()
The visualization of the top 20 countries with the highest suicide rates demonstrates that there is minimal correlation between population size and suicide rates. This indicates that population alone does not play a significant role in the increase of suicide rates. Other factors, such as socioeconomic conditions, mental health awareness, and cultural influences, may have a more substantial impact on suicide rates.
suicide_age_data = df.groupby(['age', 'generation']).agg({
'suicides_rate':'mean'
}).reset_index().sort_values(by='suicides_rate')
fig = px.bar(suicide_age_data, x='age', y='suicides_rate', color='generation',
barmode='group', title='Suicide Rates by Age Category and Gender')
fig.show()
The visualization shows that the G.I. Generation and Silent Generation have higher suicide rates compared to other generations. This could be because they lived through difficult times like World War I, the Great Depression, and World War II, which may have led to more feelings of despair. On the other hand, Generation Z, Generation X, and Millennials have lower suicide rates, possibly because they grew up in a time of technological advancements and positive societal changes, which could have provided more support for their mental well-being.
top_10_countries = df.groupby('country')['suicides_rate'].mean().nlargest(10).index
filtered_data = df[df['country'].isin(top_10_countries)]
data_gdp_per_capita = filtered_data.groupby(['country', 'year'])['gdp_per_capita_($)'].mean().reset_index()
fig = px.line(data_gdp_per_capita, x='year', y='gdp_per_capita_($)', color='country',
title='GDP per Capita Variation Across Top 10 Countries with Highest Suicide Rates')
fig.show()
The visualization shows how the GDP per capita changed from 1985 to 2015. Between 1985 and 2000, the GDP per capita stayed fairly consistent, meaning there weren't major changes in the average income per person. However, starting from 2000, there was a noticeable increase in the GDP per capita, suggesting that the economies of these countries started to grow and improve.
# Group the data by generation, year, and calculate average values
grouped_data = df.groupby(['generation', 'year']).agg({
'suicides_rate': 'mean',
'gdp_per_capita_($)': 'mean'
}).reset_index()
# Create a scatter plot with animation for each year
fig = px.scatter(grouped_data, x='gdp_per_capita_($)', y='suicides_rate', color='generation',
labels={'gdp_per_capita_($)': 'GDP per Capita ($)', 'suicides_rate': 'Suicide Rate'},
title='Ratio of Suicides vs. GDP per Capita by Generation (Over Time)')
fig.show()
The visualization shows that the G.I. Generation had less GDP per Capita, which might explain why they had a higher suicide rate. On the other hand, the following generations had more GDP per Capita, and they tended to have lower suicide rates. Generation Z had the most GDP per Capita and the lowest suicide rate, suggesting that having more money may be related to better mental well-being across different generations.
data = df.groupby('country')['suicides_rate'].mean().reset_index()
fig = go.Figure(data=go.Choropleth(
locations=data['country'],
locationmode='country names',
z=data['suicides_rate'],
colorscale='reds',
colorbar_title='Suicide Rate',
))
fig.update_layout(
title='Suicide Rate by Country',
geo=dict(showframe=False, showcoastlines=False, projection_type='equirectangular'),
)
fig.show()
The map shows that the Russian Federation has the highest suicide rate, followed by neighboring European countries, which could be attributed to historical events such as wars or periods of economic hardship. In contrast, countries like Australia and America exhibit relatively lower suicide rates. These variations may be influenced by a combination of historical, cultural, and socioeconomic factors that impact mental health and well-being.
cols = ['suicides_no', 'population', 'suicides/100k_pop',
'gdp_for_year_($)', 'gdp_per_capita_($)', 'suicides_rate',
'gdp_total', 'suicides_per_gdp']
correlation_matrix = df[cols].corr()
fig = px.imshow(correlation_matrix, text_auto=True)
fig.update_layout(title="Correlation Matrix",
xaxis=dict(title="Columns"),
yaxis=dict(title="Columns"))
fig.show()
The visualizations and analysis provide valuable insights into suicide rates, GDP per Capita, and their relationships. The findings reveal that older individuals have higher suicide rates, emphasizing the need for targeted support for this age group. Males exhibit higher suicide rates across all age groups, highlighting the importance of addressing mental health challenges specific to gender. The negative correlation between GDP per Capita and suicide rates underscores the role of economic development in promoting mental well-being. Additionally, the variations in suicide rates among countries indicate the influence of historical events and socioeconomic factors. The findings suggest that improving living standards and mental health support contribute to suicide prevention.